Run the cudf-polars test suite against `DaskEngine` and `RayEngine` by madsbk · Pull Request #22381 · rapidsai/cudf

madsbk · 2026-05-05T17:31:55Z

Builds on the cached streaming_engines fixture from #22364, which amortizes SPMD bootstrap via _reset(), and extends the same pattern to Dask and Ray.

With this change, the test matrix runs against:

["in-memory", "spmd", "spmd-small", "dask", "ray"]

subject to package availability and rrun gating.

We might change the different setups later, but for now CI runs:

Engine	Block Size(s)	GPU Configuration
`SPMDEngine`	`"medium"`, `"small"`	Single GPU
`DaskEngine`	`"medium"`	Single GPU
`RayEngine`	`"medium"`	Two GPUs

…ask-and-ray

pentschev · 2026-05-06T18:51:31Z

      # (rapidsmpf compatibility already validated in rapidsmpf CI)
      matrix_filter: map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by([(.PY_VER|split(".")|map(tonumber)), (.CUDA_VER|split(".")|map(tonumber))]))
      build_type: pull-request
+      container-options: "--cap-add CAP_SYS_PTRACE --shm-size=8g --ulimit=nofile=1000000:1000000"


These changes are necessary to provide enough resources for UCX. For reference, we already do the same for UCXX and RapidsMPF, both need it for the same reason:

https://github.com/rapidsai/rapidsmpf/blob/14569731349e832dfc12e2105bf5cfe7515ca459/.github/workflows/pr.yaml#L219

https://github.com/rapidsai/ucxx/blob/e87e2c52d634562c8f2dcffb6488ea5c9afb624a/.github/workflows/pr.yaml#L284

…t-dask-and-ray

The ConditionalJoin lowering only triggered the multi-partition fallback when the smaller side had more than one partition, missing asymmetric cases such as (left=2, right=1). In those cases no Repartition was inserted, so on multi-rank engines the single right partition only existed on one rank. Peer ranks executed the conditional join against an empty right side and silently dropped rows. For example, with num_ranks=2 a join_where producing 90 rows on CPU returned only 54 rows on RayEngine, missing the 36 rows from rank 1. Fix this by gating the fallback on output_count > 1, i.e. max(left_count, right_count) > 1. This ensures the smaller side is repartitioned into a single broadcastable partition whenever either input is multi-partition. The single-partition equi-case remains unchanged. Surfaced while enabling DaskEngine and RayEngine in the cudf-polars test fixture matrix. The failing cases: test_join_conditional[{dask,ray}-9-{True,False}] reported: DataFrames are different (height mismatch) [left]: 90 [right]: 54 With this fix all 16 cases pass (4 engines × 2 reverse × 2 max_rows_per_partition), and SPMD now correctly emits the fallback warning for max_rows_per_partition=9.

…t-dask-and-ray

madsbk · 2026-05-07T08:11:08Z

    fallback_msg = "ConditionalJoin not supported for multiple partitions."
    if left_count < right_count:
-        if left_count > 1 or dynamic_planning:
+        if output_count > 1 or dynamic_planning:


Fixed a bug in the conditional-join fallback logic. We need to repartition the smaller table to a single broadcastable partition whenever either side has more than one partition.

Previously the fallback only triggered when the smaller side itself had multiple partitions, which breaks asymmetric cases like (left=2, right=1). In that situation the single right partition only exists on one rank, so peer ranks execute the conditional join against an empty right side and silently drop rows.

@TomAugspurger and @rjzamora, does that sound correct?

I'll defer to Rick on this...

Though if the new way is correct, then maybe the logic would be a bit more obvious if we combine the first two if conditions?

if (left_count < right_count) and (output_count > 1 or dynamic_planning): left = Repartition(left.schema, left) pi_left[left] = PartitionInfo(count=1) _fallback_inform(fallback_msg, config_options) elif output_count > 1 or dynamic_planning: right = Repartition(right.schema, right) pi_right[right] = PartitionInfo(count=1) _fallback_inform(fallback_msg, config_options)

Then the repeated condition (left_count < right_count) will maybe a bit a bit clear (that's true, but the second condition output_count > 1 or dynamic_planning isn't.

At least I hope those are logically equivalent.

I guess we are maybe disabling dynamic planning for some join tests? By default, dynamic-planning is always on, so we always repartition the smaller side.

I guess we are maybe disabling dynamic planning for some join tests? By default, dynamic-planning is always on, so we always repartition the smaller side.

Maybe, it only happens when running with multiple GPUs

Though if the new way is correct, then maybe the logic would be a bit more obvious if we combine the first two if conditions?

Yes, and we can even simplify a bit more. Done

bdice

Approving CI and packaging!

TomAugspurger · 2026-05-07T14:15:46Z

It looks like the wheel-tests-cudf-polars job is now taking >2 hours, which feels excessive: https://github.com/rapidsai/cudf/actions/runs/25492512677/job/74812354023?pr=22381.

Part of that is because we're running these tests against multiple versions of polars. I think we could get away with running the full suite of tests (all tests, all engines) against only the latest version of polars.

But I also wonder whether we're still unexpectedly creating expensive resources like Ray actors in too many places.

…t-dask-and-ray

pentschev

Left a couple of comments but I won't block this PR. I don't think it's wise for us to hardcode test details to fit CI, rather we should make them configurable and make CI fit that instead.

madsbk · 2026-05-07T15:16:11Z

It looks like the wheel-tests-cudf-polars job is now taking >2 hours, which feels excessive: https://github.com/rapidsai/cudf/actions/runs/25492512677/job/74812354023?pr=22381.

Part of that is because we're running these tests against multiple versions of polars. I think we could get away with running the full suite of tests (all tests, all engines) against only the latest version of polars.

But I also wonder whether we're still unexpectedly creating expensive resources like Ray actors in too many places.

I think this was just a slower runner. Note that its sibling job only took 16 minutes. In fact, I do not even think wheel-tests-cudf-polars runs with Ray? I only added Ray to the wheel-tests-cudf-polars-with-rapidsmpf runs.

coderabbitai · 2026-05-07T17:47:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 83ee7d38-11c5-44ae-8e2f-c6db96c4637d

📥 Commits

Reviewing files that changed from the base of the PR and between c9ad1c5 and 83059fd.

📒 Files selected for processing (22)

.github/workflows/pr.yaml
.github/workflows/test.yaml
ci/run_cudf_polars_experimental_pytests.sh
ci/test_cudf_polars_experimental.sh
dependencies.yaml
python/cudf_polars/cudf_polars/experimental/join.py
python/cudf_polars/cudf_polars/testing/engine_utils.py
python/cudf_polars/pyproject.toml
python/cudf_polars/tests/conftest.py
python/cudf_polars/tests/experimental/test_all_gather_host_data.py
python/cudf_polars/tests/experimental/test_dataframescan.py
python/cudf_polars/tests/experimental/test_filter.py
python/cudf_polars/tests/experimental/test_groupby.py
python/cudf_polars/tests/experimental/test_io_multirank.py
python/cudf_polars/tests/experimental/test_join.py
python/cudf_polars/tests/experimental/test_metadata.py
python/cudf_polars/tests/experimental/test_parallel.py
python/cudf_polars/tests/experimental/test_rolling.py
python/cudf_polars/tests/experimental/test_select.py
python/cudf_polars/tests/experimental/test_spilling.py
python/cudf_polars/tests/experimental/test_statistics.py
python/cudf_polars/tests/experimental/test_unique.py

💤 Files with no reviewable changes (1)

python/cudf_polars/tests/experimental/test_all_gather_host_data.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added Ray as an optional dependency for distributed computing support
Bug Fixes
- Improved conditional join partition handling for multi-rank scenarios
Tests
- Enhanced multi-rank distributed test infrastructure with improved engine handling and warning detection
- Extended test fixtures for SPMD-only and streaming engine variants
Chores
- Updated CI container configuration with increased shared memory and file limits for test stability
- Simplified test suite logging messages

Walkthrough

This PR extends cudf-polars' test infrastructure to support multi-rank Ray streaming execution alongside SPMD. Ray is added as an optional dependency, container runtime options are configured for CI, and tests are refactored to use conditional fixtures and warning helpers that adapt behavior based on engine type and rank configuration.

Changes

Ray Dependency & Infrastructure Setup

Layer / File(s)	Summary
Ray Dependency Declarations `dependencies.yaml`, `python/cudf_polars/pyproject.toml`, `ci/test_cudf_polars_experimental.sh`	Ray ≥2.55.1 added as optional dependency; new dependency group `py_run_cudf_polars_ray` created in manifest.
CI Container & Script Configuration `.github/workflows/pr.yaml`, `.github/workflows/test.yaml`, `ci/run_cudf_polars_experimental_pytests.sh`	Container options configure CAP_SYS_PTRACE, 8GB shared memory, and 1M file descriptor limit for `wheel-tests-cudf-polars-with-rapidsmpf` job; test message simplified.
Test Helper Functions `python/cudf_polars/cudf_polars/testing/engine_utils.py`	New `warns_on_spmd(engine, args, when=True, *kwargs)` helper conditionally applies `pytest.warns` for SPMD engines only; streaming-engine fixture params gated behind `rapidsmpf.bootstrap.is_running_with_rrun()` check; `allow_gpu_sharing=True` added to baselines.
Test Fixtures & Markers `python/cudf_polars/tests/conftest.py`	`NUM_RANKS = 2` constant added; RayEngine registration with pinned ranks and GPU-sharing options; new `spmd_engine_factory` fixture for SPMD-only creation; `skip_on_streaming_engine` marker extended with optional `engine=` keyword filtering.
Core Logic Update `python/cudf_polars/cudf_polars/experimental/join.py`	`ConditionalJoin` repartitioning logic updated to repartition multi-partition sides to single partition when either side has >1 partition or dynamic planning enabled.

Test Suite Refactoring for Multi-Rank Support

Layer / File(s)	Summary
Factory Fixture Migration `python/cudf_polars/tests/experimental/test_select.py`, `test_parallel.py`, `test_metadata.py`, `test_spilling.py`, `test_groupby.py`	Tests requiring SPMD-specific execution migrate from `streaming_engine_factory` to `spmd_engine_factory` to ensure deterministic single-rank setup.
Simplified Multi-Backend Fixtures `python/cudf_polars/tests/experimental/test_io_multirank.py`, `test_statistics.py`	Test fixtures consolidate from explicit backend branching (`["spmd", "ray", "dask"]` params) to single `streaming_engine_factory`-created fixtures, removing environment checks and backend-specific imports.
Warning Assertion Updates `python/cudf_polars/tests/experimental/test_filter.py`, `test_join.py`, `test_select.py`, `test_unique.py`	`pytest.warns` replaced with `warns_on_spmd(...)` to conditionally assert warnings only for SPMD engines; warning visibility on Ray/Dask is not enforced in tests.
Conditional Runtime Markers `python/cudf_polars/tests/experimental/test_dataframescan.py`, `test_all_gather_host_data.py`, `test_rolling.py`, `test_groupby.py`	xfail/skip markers applied conditionally at runtime based on engine type or rank count; e.g., `test_dataframescan_concat` xfails only when `nranks > 1`, UUID uniqueness assertion removed from cluster-info test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.90% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly describes the main objective of the pull request: extending test coverage to run the cudf-polars test suite against DaskEngine and RayEngine, which aligns with the changeset's primary focus.
Description check	✅ Passed	The description is directly related to the changeset, explaining the extension of the cached streaming_engines fixture to support Dask and Ray engines and detailing the new test matrix configuration.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mroeschke

Just one comment about adding a run_constraints for ray in the conda recipe.

mroeschke · 2026-05-07T18:43:28Z

+    common:
+      - output_types: [conda, requirements, pyproject]
+        packages:
+          - ray>=2.55.1


In conda/recipes/cudf-polars/recipe.yaml, we'll probably want to add a

run_constraints: - ray >=2.55.1

to mirror this constraint

Ray is still optional, this is just for the CI testing

Right, IIRC run-constraints is like specifying a version constraint for optional runtime dependencies (like what got added to the pyproject.toml)

https://rattler-build.prefix.dev/latest/reference/recipe_file/#run-constraints

Packages that are optional at runtime but must obey the supplied additional constraint if they are installed.

But can be done in a follow up since the CI is currently green

Ah, I didn't know that, thanks.
But yes, let's do it in a follow-up PR.

Sure thing, opened #22414

madsbk · 2026-05-07T19:07:17Z

/merge

…apidsai#22381) Builds on the cached `streaming_engines` fixture from rapidsai#22364, which amortizes SPMD bootstrap via `_reset()`, and extends the same pattern to Dask and Ray. With this change, the test matrix runs against: `["in-memory", "spmd", "spmd-small", "dask", "ray"]` subject to package availability and `rrun` gating. We might change the different setups later, but for now CI runs: | Engine | Block Size(s) | GPU Configuration | |----------------|-----------------------|-------------------| | `SPMDEngine` | `"medium"`, `"small"` | Single GPU | | `DaskEngine` | `"medium"` | Single GPU | | `RayEngine` | `"medium"` | Two GPUs | Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) - Peter Andreas Entschev (https://github.com/pentschev) - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#22381

madsbk self-assigned this May 5, 2026

madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 5, 2026

github-actions Bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels May 5, 2026

github-project-automation Bot added this to cuDF Python May 5, 2026

GPUtester moved this to In Progress in cuDF Python May 5, 2026

madsbk mentioned this pull request May 5, 2026

[cudf-polars] run tests against RayEngine in CI #22359

Closed

madsbk force-pushed the engine_reset-test-dask-and-ray branch 3 times, most recently from 7bb501d to 4c5b5da Compare May 6, 2026 11:06

Matt711 mentioned this pull request May 6, 2026

Move replicated-output dedup to the Dask and Ray frontends #22394

Open

3 tasks

madsbk added 5 commits May 6, 2026 15:33

Test Ray and Dask engine

1b165a8

CI: install Ray

7cdedd0

cleanup

8feaf42

CI: restrict UCX to TCP and CUDA transports

e26e24b

NUM_RANKS = 2

26daa0e

madsbk force-pushed the engine_reset-test-dask-and-ray branch from d088a13 to 3fccdf3 Compare May 6, 2026 13:33

spmd_engine_factory

b294bf8

madsbk force-pushed the engine_reset-test-dask-and-ray branch from 3fccdf3 to b294bf8 Compare May 6, 2026 14:05

madsbk and others added 3 commits May 6, 2026 16:11

remove noise

e5427fa

Add UCX-specific configs to cudf-polars CI runners

75d234c

Remove UCX_TLS config from cudf-polars CI script

41d71b3

github-actions Bot assigned pentschev May 6, 2026

pentschev added 3 commits May 6, 2026 08:25

Remove unnecessary blank line

9a05eb8

Fix container option matrix

0c26c09

Merge remote-tracking branch 'upstream/main' into engine_reset-test-d…

f3dc183

…ask-and-ray

pentschev reviewed May 6, 2026

View reviewed changes

madsbk marked this pull request as ready for review May 6, 2026 18:56

madsbk requested a review from a team as a code owner May 6, 2026 18:56

madsbk added 7 commits May 7, 2026 07:52

Merge branch 'main' of github.com:rapidsai/cudf into engine_reset-tes…

d280fab

…t-dask-and-ray

Don't limit to spmd_engine, ignore warning locally instead

5c4d952

warns_on_spmd

8c1819f

more warns_on_spmd

6482434

test_rapidsmpf_join_metadata(): rapidsai#22394

f79fb71

Merge branch 'main' of github.com:rapidsai/cudf into engine_reset-tes…

cd78518

…t-dask-and-ray

madsbk requested review from TomAugspurger and pentschev May 7, 2026 08:07

madsbk commented May 7, 2026

View reviewed changes

madsbk added 3 commits May 7, 2026 12:17

xskip: multi-rank fallback

56fc913

cover

88f6b5d

test_over_in_filter_unsupported: fix warning

734df2e

bdice approved these changes May 7, 2026

View reviewed changes

Merge branch 'main' of github.com:rapidsai/cudf into engine_reset-tes…

efc904b

…t-dask-and-ray

pentschev approved these changes May 7, 2026

View reviewed changes

Comment thread python/cudf_polars/tests/conftest.py

Comment thread python/cudf_polars/tests/conftest.py

ConditionalJoin cleanup

83059fd

madsbk mentioned this pull request May 7, 2026

Introduce a process-wide singleton engine for .collect(engine="gpu") #22410

Open

madsbk force-pushed the engine_reset-test-dask-and-ray branch from 6e28a16 to 83059fd Compare May 7, 2026 17:36

mroeschke approved these changes May 7, 2026

View reviewed changes

rapids-bot Bot merged commit 16c6356 into rapidsai:main May 7, 2026
377 of 384 checks passed

github-project-automation Bot moved this from In Progress to Done in cuDF Python May 7, 2026

This was referenced May 7, 2026

Add ray run_constraints in cudf_polars conda recipe #22414

Open

Run conda, cudf_polars CI tests with Ray #22420

Open

madsbk deleted the engine_reset-test-dask-and-ray branch May 9, 2026 07:32

Conversation

madsbk commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdice left a comment

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented May 7, 2026

Uh oh!

pentschev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

madsbk commented May 7, 2026

Uh oh!

coderabbitai Bot commented May 7, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mroeschke May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

madsbk commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

madsbk commented May 5, 2026 •

edited

Loading

mroeschke May 7, 2026 •

edited

Loading